In this exercise, you will explore crime data from the District of Columbia (a.k.a., Washington, DC) from October 2019 to October 2021. Each crime record includes the geographic coordinates of the crime (recorded in EPSG 4326), the date and time it occurred, whether it was a violent or property crime, and the type of offense. Your job will be to generate maps of these crimes across the District.
This assignment will give you an opportunity to test your sf, purrr::map, and tmap skills!
The points allotted for each question are provided in highlighted red bold text (e.g., [1.0]) within the question itself. When applicable, total points for a question may represent the sum of individually graded components, which are provided in red text (e.g., [1.0]).
Points may be deducted from each question’s total:
eval = TRUE (but see below);Note: The maximum deduction is the total points value for a given question
Pay careful attention to the format of your code – not following the rules above will cost you lots of little points (and some big ones) that can really add up!
In addition to points allotted per question, you must ensure that your R Markdown document runs out-of-the-box [25% off of the total grade] – in other words, the document will knit without error. Some tips for doing so:
source() or read_csv()).setwd() in your code
(Never use setwd()!)eval = FALSE in the options for that chunk. Otherwise,
ensure all of your code chunks options are
eval = TRUE. Click the blue button below
to view the functions that you may use in completing this problem set.
Make sure that you know what each function does (use
?[function name] if you do not). Do not use any functions
outside of this list!
In this assignment, you may use only the following R
functions (Note: If you are unclear on what a given function does,
use ? to view the help file!):
()=<-==!~$%>%+Note: The packages dplyr,
ggplot2, lubridate, magrittr, purrr, readr, tidyr, and tibble are all
part of the tidyverse metapackage and are loaded with
library(tidyverse).
1. [0.5] Save and knit this document:
problem_set_4_Evans_Brian.rmd2. [0.5] Set up your session:
sf,
tmap, and tidyverse libraries;library(sf)
library(tmap)
library(tidyverse)
tmap_mode('plot')
We will use the shapefile dataset dc_census.geojson as a
template upon which we will display crimes in the District of Columbia.
The coordinate reference system (CRS) of these data is EPSG 32618
(Universal Transverse Mercator, zone 18N). The field geoid
is the primary key for each polygon in the dataset and is the only field
that we will use for this problem set.
3. [0.75] As parsimoniously as possible,
read in the census data (dc_census.geojson) [0.15], and:
GEOID;GEOID from upper to lower case;census.census <-
st_read('data/raw/shapefiles/dc_census.geojson') %>%
select(geoid = GEOID)
Crime data, dc_crimes.csv, were obtained from
Open Data DC.
This is a tabular dataset and each row represents the record for an
individual crime committed. Fields (columns) include:
id: The primary key for each observation;longitude and latitude: The geographic
coordinates of crimes, recorded using a handheld GPS unit set to EPSG
4326 (Geodetic CRS, World Geodetic System 1984);date_time: The date and time a crime occurred using the
format yyyy-mm-dd hh:mm:ss (International Organization for
Standardization 8601);offense: The type of crime committed;offense_group: Categories for offenses – “property” or
“violent” crimes.Our goal in this problem set is to evaluate the number and spatial distribution of violent crimes that were committed in 2020.
4. [1.25] Read in the crimes dataset
(dc_crimes.csv) [0.15],
and:
offense_group is categorized as “violent” and the crime was
committed in 2020;offense_group and date_time;sf
object with the same CRS as census;crimes.crimes <-
read_csv('data/raw/dc_crimes.csv') %>%
filter(offense_group == "violent",
lubridate::year(date_time) == 2020) %>%
select(!offense_group,!date_time) %>%
st_as_sf(coords = c('longitude', 'latitude'),
crs = 4326) %>%
st_transform(st_crs(census))
## Rows: 57400 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): id, offense, offense_group
## dbl (2): longitude, latitude
## dttm (1): date_time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Later this problem set (questions 8 and 9), you will plot number and spatial distribution of crimes by census tract and type of offense. To do so, we must tabulate the number of crimes for each tract.
5. [2.0] Create a shapefile that describes the number of crimes per census tract and offense:
census
data to the crimes dataset;geoid and offense;geoid, robbery,
assault w/dangerous weapon, sex abuse, and
homicide;crimes_by_census_tract.crimes_by_census_tract <-
census %>%
st_join(crimes,
by = c("geoid" = "id")) %>%
as_tibble() %>%
summarize(n = n(),
.by = c("geoid", "offense")) %>%
filter(!is.na(offense)) %>%
pivot_wider(
id_cols = geoid,
names_from = offense,
values_from = n,
values_fill = 0) %>%
left_join(census, .)
## Joining with `by = join_by(geoid)`
6. [0.5] Combine
crimes_by_census_tract, crimes, and
census into a single list object [0.2]. In doing so:
n_crimes to crimes_by_census_tract,
crime_locations to crimes, and
tracts to census;shapes_utm.shapes_utm <-
list(
"n_crimes" = crimes_by_census_tract,
"crime_locations" = crimes,
"tracts" = census)
7. [1.5] Using purrr::map()
for iteration [1.0], convert the CRS of
the shapefiles contained in shapes_utm to EPSG 4326 [0.4] and assign the list to your global
environment with the name shapes_4326 [0.1].
shapes_4326 <-
shapes_utm %>%
map(
~ st_transform(.x, crs = 4326))
8. [1.5] Using shapes_utm,
generate a static choropleth tmap of census tracts [0.5] where the fill color is determined by the
number of robberies committed [1.0].
tm_shape(shapes_utm$n_crimes) +
tm_polygons(col = 'robbery')
9. [0.5] Set the tmap mode to interactive viewing:
tmap_mode("view")
## tmap mode set to interactive viewing
10. [1.0] Using shapes_4326,
generate an interactive tmap where:
tm_basemap(
c("OpenStreetMap",
"Esri.WorldImagery")) +
tm_shape(shapes_utm$n_crimes,
name = "Robberies") +
tm_polygons(col = 'robbery') +
tm_shape(shapes_utm$n_crimes,
name = "Homicides") +
tm_dots('homicide',
clustering = TRUE)
Extra credit! [0.25]
Modify Question 10 such that the polygons are semi-transparent
(Note: I have not taught transparency yet, but you can find
information on how to do so with ?tm_polygons).
tm_basemap(
c("OpenStreetMap",
"Esri.WorldImagery")) +
tm_shape(shapes_utm$n_crimes,
name = "Robberies") +
tm_polygons(col = 'robbery',
alpha = .5) +
tm_shape(shapes_utm$n_crimes,
name = "Homicides") +
tm_dots('homicide',
clustering = TRUE)